AITopics | causal abstraction

Collaborating Authors

causal abstraction

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

4f5c422f4d49a5a807eda27434231040-Paper.pdf

Neural Information Processing SystemsFeb-8-2026, 15:17:31 GMT

hypothesis, intervention, representation, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Italy > Tuscany > Florence (0.04)
(10 more...)

Genre: Research Report (0.46)

Industry: Transportation (0.33)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.72)

Add feedback

Triangulation as an Acceptance Rule for Multilingual Mechanistic Interpretability

Long, Yanan

arXiv.org Machine LearningJan-1-2026

Multilingual language models achieve strong aggregate performance yet often behave unpredictably across languages, scripts, and cultures. We argue that mechanistic explanations for such models should satisfy a \emph{causal} standard: claims must survive causal interventions and must \emph{cross-reference} across environments that perturb surface form while preserving meaning. We formalize \emph{reference families} as predicate-preserving variants and introduce \emph{triangulation}, an acceptance rule requiring necessity (ablating the circuit degrades the target behavior), sufficiency (patching activations transfers the behavior), and invariance (both effects remain directionally stable and of sufficient magnitude across the reference family). To supply candidate subgraphs, we adopt automatic circuit discovery and \emph{accept or reject} those candidates by triangulation. We ground triangulation in causal abstraction by casting it as an approximate transformation score over a distribution of interchange interventions, connect it to the pragmatic interpretability agenda, and present a comparative experimental protocol across multiple model families, language pairs, and tasks. Triangulation provides a falsifiable standard for mechanistic claims that filters spurious circuits passing single-environment tests but failing cross-lingual invariance.

intervention, large language model, machine learning, (18 more...)

arXiv.org Machine Learning

2512.24842

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Causal Abstractions of Neural Networks

Neural Information Processing SystemsDec-24-2025, 02:47:10 GMT

Structural analysis methods (e.g., probing and feature attribution) are increasingly important tools for neural network analysis. We propose a new structural analysis method grounded in a formal theory of causal abstraction that provides rich characterizations of model-internal representations and their roles in input/output behavior. In this method, neural representations are aligned with variables in interpretable causal models, and then interchange interventions are used to experimentally verify that the neural representations have the causal properties of their aligned variables. We apply this method in a case study to analyze neural models trained on Multiply Quantified Natural Language Inference (MQNLI) corpus, a highly complex NLI dataset that was constructed with a tree-structured natural logic causal model. We discover that a BERT-based model with state-of-the-art performance successfully realizes parts of the natural logic model's causal structure, whereas a simpler baseline model fails to show any such structure, demonstrating that neural representations encode the compositional structure of MQNLI examples.

causal abstraction, name change, neural network, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.31)

Add feedback

CAuSE: Decoding Multimodal Classifiers using Faithful Natural Language Explanation

Bandyopadhyay, Dibyanayan, Bhattacharjee, Soham, Hasanuzzaman, Mohammed, Ekbal, Asif

arXiv.org Artificial IntelligenceDec-9-2025

Multimodal classifiers function as opaque black box models. While several techniques exist to interpret their predictions, very few of them are as intuitive and accessible as natural language explanations (NLEs). To build trust, such explanations must faithfully capture the classifier's internal decision making behavior, a property known as faithfulness. In this paper, we propose CAuSE (Causal Abstraction under Simulated Explanations), a novel framework to generate faithful NLEs for any pretrained multimodal classifier. We demonstrate that CAuSE generalizes across datasets and models through extensive empirical evaluations. Theoretically, we show that CAuSE, trained via interchange intervention, forms a causal abstraction of the underlying classifier. We further validate this through a redesigned metric for measuring causal faithfulness in multimodal settings. CAuSE surpasses other methods on this metric, with qualitative analysis reinforcing its advantages. We perform detailed error analysis to pinpoint the failure cases of CAuSE. For replicability, we make the codes available at https://github.com/newcodevelop/CAuSE

explanation, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2512.06814

Country: North America > United States (0.46)

Genre: Research Report (0.50)

Industry:

Transportation (0.76)
Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic Interpretability?

Sutter, Denis, Minder, Julian, Hofmann, Thomas, Pimentel, Tiago

arXiv.org Artificial IntelligenceNov-13-2025

The concept of causal abstraction got recently popularised to demystify the opaque decision-making processes of machine learning models; in short, a neural network can be abstracted as a higher-level algorithm if there exists a function which allows us to map between them. Notably, most interpretability papers implement these maps as linear functions, motivated by the linear representation hypothesis: the idea that features are encoded linearly in a model's representations. However, this linearity constraint is not required by the definition of causal abstraction. In this work, we critically examine the concept of causal abstraction by considering arbitrarily powerful alignment maps. In particular, we prove that under reasonable assumptions, any neural network can be mapped to any algorithm, rendering this unrestricted notion of causal abstraction trivial and uninformative. We complement these theoretical findings with empirical evidence, demonstrating that it is possible to perfectly map models to algorithms even when these models are incapable of solving the actual task; e.g., on an experiment using randomly initialised language models, our alignment maps reach 100\% interchange-intervention accuracy on the indirect object identification task. This raises the non-linear representation dilemma: if we lift the linearity constraint imposed to alignment maps in causal abstraction analyses, we are left with no principled way to balance the inherent trade-off between these maps' complexity and accuracy. Together, these results suggest an answer to our title's question: causal abstraction is not enough for mechanistic interpretability, as it becomes vacuous without assumptions about how models encode information. Studying the connection between this information-encoding assumption and causal abstraction should lead to exciting future work.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2507.08802

Country:

Asia (1.00)
North America > United States (0.67)

Genre: Research Report > New Finding (1.00)

Industry:

Government > Regional Government (0.45)
Transportation > Ground (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

26b8e3dc3a21fcd660d80c63b767f324-Paper-Conference.pdf

Neural Information Processing SystemsOct-9-2025, 21:16:38 GMT

abstraction, intervention, simulator, (16 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > United States > Virginia (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Denmark > Capital Region > Copenhagen (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine (1.00)
Banking & Finance > Economy (1.00)
Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Causal Abstractions, Categorically Unified

Englberger, Markus, Dhami, Devendra Singh

arXiv.org Machine LearningOct-7-2025

We present a categorical framework for relating causal models that represent the same system at different levels of abstraction. We define a causal abstraction as natural transformations between appropriate Markov functors, which concisely consolidate desirable properties a causal abstraction should exhibit. Our approach unifies and generalizes previously considered causal abstractions, and we obtain categorical proofs and generalizations of existing results on causal abstractions. Using string diagrammatical tools, we can explicitly describe the graphs that serve as consistent abstractions of a low-level graph under interventions. We discuss how methods from mechanistic interpretability, such as circuit analysis and sparse autoencoders, fit within our categorical framework. We also show how applying do-calculus on a high-level graphical abstraction of an acyclic-directed mixed graph (ADMG), when unobserved confounders are present, gives valid results on the low-level graph, thus generalizing an earlier statement by Anand et al. (2023). We argue that our framework is more suitable for modeling causal abstractions compared to existing categorical frameworks. Finally, we discuss how notions such as $τ$-consistency and constructive $τ$-abstractions can be recovered with our framework.

abstraction, causal abstraction, graphical abstraction, (12 more...)

arXiv.org Machine Learning

2510.05033

Country: Europe > Netherlands > North Brabant > Eindhoven (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.34)

Add feedback

Using causal abstractions to accelerate decision-making in complex bandit problems

Dyer, Joel, Bishop, Nicholas, Calinescu, Anisoara, Wooldridge, Michael, Zennaro, Fabio Massimo

arXiv.org Artificial IntelligenceSep-5-2025

Although real-world decision-making problems can often be encoded as causal multi-armed bandits (CMABs) at different levels of abstraction, a general methodology exploiting the information and computational advantages of each abstraction level is missing. In this paper, we propose AT-UCB, an algorithm which efficiently exploits shared information between CMAB problem instances defined at different levels of abstraction. More specifically, AT-UCB leverages causal abstraction (CA) theory to explore within a cheap-to-simulate and coarse-grained CMAB instance, before employing the traditional upper confidence bound (UCB) algorithm on a restricted set of potentially optimal actions in the CMAB of interest, leading to significant reductions in cumulative regret when compared to the classical UCB algorithm. We illustrate the advantages of AT-UCB theoretically, through a novel upper bound on the cumulative regret, and empirically, by applying AT-UCB to epidemiological simulators with varying resolution and computational cost.

abstraction, data mining, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2509.04296

Country: Europe > United Kingdom > England (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.94)

Add feedback

Heads or Tails: A Simple Example of Causal Abstractive Simulation

Simmons, Gabriel

arXiv.org Artificial IntelligenceSep-3-2025

This note illustrates how a variety of causal abstraction arXiv:1707.00819 arXiv:1812.03789, defined here as causal abstractive simulation, can be used to formalize a simple example of language model simulation. This note considers the case of simulating a fair coin toss with a language model. Examples are presented illustrating the ways language models can fail to simulate, and a success case is presented, illustrating how this formalism may be used to prove that a language model simulates some other system, given a causal description of the system. This note may be of interest to three groups. For practitioners in the growing field of language model simulation, causal abstractive simulation is a means to connect ad-hoc statistical benchmarking practices to the solid formal foundation of causality. Philosophers of AI and philosophers of mind may be interested as causal abstractive simulation gives a precise operationalization to the idea that language models are role-playing arXiv:2402.12422. Mathematicians and others working on causal abstraction may be interested to see a new application of the core ideas that yields a new variation of causal abstraction.

artificial intelligence, natural language, observer, (14 more...)

arXiv.org Artificial Intelligence

2509.01136

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

Identifiability in Causal Abstractions: A Hierarchy of Criteria

Yvernes, Clément, Devijver, Emilie, Clausel, Marianne, Gaussier, Eric

arXiv.org Artificial IntelligenceJul-9-2025

Identifying the effect of a treatment from observational data typically requires assuming a fully specified causal diagram. However, such diagrams are rarely known in practice, especially in complex or high-dimensional settings. To overcome this limitation, recent works have explored the use of causal abstractions-simplified representations that retain partial causal information. In this paper, we consider causal abstractions formalized as collections of causal diagrams, and focus on the identifiability of causal queries within such collections. We introduce and formalize several identifiability criteria under this setting. Our main contribution is to organize these criteria into a structured hierarchy, highlighting their relationships. This hierarchical view enables a clearer understanding of what can be identified under varying levels of causal knowledge. We illustrate our framework through examples from the literature and provide tools to reason about identifiability when full causal knowledge is unavailable.

artificial intelligence, identifiability, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2507.06213

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.69)
Information Technology > Artificial Intelligence > Machine Learning (0.68)

Add feedback